The aim of the study is to assist a real estate company that has a niche in purchasing properties to rent out short-term as part of their business model specifically within New York City. The company has already figured out that 2-bedroom properties are the best for investment; however, they do not know which zipcodes are the best to invest in. A data product is requested to build in order to help the company understand which zipcodes would generate the most profit on short term rentals within New York City.
For this purpose, publicly available data from Zillow and AirBnB are used:
Cost Data - Estimate of value for two-bedroom properties provided by Zillow
http://insideairbnb.com/get-the-data.html
Revenue Data - AirBnB dataset with relevant short-term rental information
https://www.zillow.com/research/data/
The Latest Cost Data – A historical estimated data of value for two-bedroom properties from 2008-01 to 2019-11, provided by Zillow
https://www.zillow.com/research/data/
All the percentages assumed above can be modified by user preference.
Data preparation:
library(tidyverse)
library(dplyr)
library(stringr)
Forecast:
library(forecast)
Visualization:
library(RColorBrewer)
library(plotly)
library(ggmap) library(knitr)
Dashboard:
library(shiny)
library(shinydashboard)
In order to fit the problem requirement and simply the future analysis, we filter the rows and select the relevant columns to keep all of the 2-bedroom properties in New York City for short-term rent.
setwd("D:/job search/capital one/data challenge/Data challenge - Elaine Ji")
airbnb <- read_csv("listings.csv")
zillow <- read_csv("Zip_Zhvi_2bedroom.csv")
#subset of airbnb
airbnb_subset <- airbnb %>%
filter(bedrooms == 2, minimum_nights < 31) %>%
select(id, last_scraped, zipcode, neighbourhood, neighbourhood_cleansed,
neighbourhood_group_cleansed, latitude, longitude, square_feet, room_type,
price, weekly_price,monthly_price, minimum_nights, maximum_nights)
#subset of zillow
zillow_subset <- zillow %>%
filter(City == "New York") %>%
select(-c(City, State, Metro, CountyName, SizeRank))
Data provides estimated historical median price for 2-bedroom homes in each zip code, captured between year 1996 and 2017 and spread monthly.
kableExtra::kable_styling(knitr::kable(head(zillow_subset), "html"),
bootstrap_options ="striped") %>%
kableExtra::scroll_box(width = "100%", height = "250px")
| RegionID | RegionName | 1996-04 | 1996-05 | 1996-06 | 1996-07 | 1996-08 | 1996-09 | 1996-10 | 1996-11 | 1996-12 | 1997-01 | 1997-02 | 1997-03 | 1997-04 | 1997-05 | 1997-06 | 1997-07 | 1997-08 | 1997-09 | 1997-10 | 1997-11 | 1997-12 | 1998-01 | 1998-02 | 1998-03 | 1998-04 | 1998-05 | 1998-06 | 1998-07 | 1998-08 | 1998-09 | 1998-10 | 1998-11 | 1998-12 | 1999-01 | 1999-02 | 1999-03 | 1999-04 | 1999-05 | 1999-06 | 1999-07 | 1999-08 | 1999-09 | 1999-10 | 1999-11 | 1999-12 | 2000-01 | 2000-02 | 2000-03 | 2000-04 | 2000-05 | 2000-06 | 2000-07 | 2000-08 | 2000-09 | 2000-10 | 2000-11 | 2000-12 | 2001-01 | 2001-02 | 2001-03 | 2001-04 | 2001-05 | 2001-06 | 2001-07 | 2001-08 | 2001-09 | 2001-10 | 2001-11 | 2001-12 | 2002-01 | 2002-02 | 2002-03 | 2002-04 | 2002-05 | 2002-06 | 2002-07 | 2002-08 | 2002-09 | 2002-10 | 2002-11 | 2002-12 | 2003-01 | 2003-02 | 2003-03 | 2003-04 | 2003-05 | 2003-06 | 2003-07 | 2003-08 | 2003-09 | 2003-10 | 2003-11 | 2003-12 | 2004-01 | 2004-02 | 2004-03 | 2004-04 | 2004-05 | 2004-06 | 2004-07 | 2004-08 | 2004-09 | 2004-10 | 2004-11 | 2004-12 | 2005-01 | 2005-02 | 2005-03 | 2005-04 | 2005-05 | 2005-06 | 2005-07 | 2005-08 | 2005-09 | 2005-10 | 2005-11 | 2005-12 | 2006-01 | 2006-02 | 2006-03 | 2006-04 | 2006-05 | 2006-06 | 2006-07 | 2006-08 | 2006-09 | 2006-10 | 2006-11 | 2006-12 | 2007-01 | 2007-02 | 2007-03 | 2007-04 | 2007-05 | 2007-06 | 2007-07 | 2007-08 | 2007-09 | 2007-10 | 2007-11 | 2007-12 | 2008-01 | 2008-02 | 2008-03 | 2008-04 | 2008-05 | 2008-06 | 2008-07 | 2008-08 | 2008-09 | 2008-10 | 2008-11 | 2008-12 | 2009-01 | 2009-02 | 2009-03 | 2009-04 | 2009-05 | 2009-06 | 2009-07 | 2009-08 | 2009-09 | 2009-10 | 2009-11 | 2009-12 | 2010-01 | 2010-02 | 2010-03 | 2010-04 | 2010-05 | 2010-06 | 2010-07 | 2010-08 | 2010-09 | 2010-10 | 2010-11 | 2010-12 | 2011-01 | 2011-02 | 2011-03 | 2011-04 | 2011-05 | 2011-06 | 2011-07 | 2011-08 | 2011-09 | 2011-10 | 2011-11 | 2011-12 | 2012-01 | 2012-02 | 2012-03 | 2012-04 | 2012-05 | 2012-06 | 2012-07 | 2012-08 | 2012-09 | 2012-10 | 2012-11 | 2012-12 | 2013-01 | 2013-02 | 2013-03 | 2013-04 | 2013-05 | 2013-06 | 2013-07 | 2013-08 | 2013-09 | 2013-10 | 2013-11 | 2013-12 | 2014-01 | 2014-02 | 2014-03 | 2014-04 | 2014-05 | 2014-06 | 2014-07 | 2014-08 | 2014-09 | 2014-10 | 2014-11 | 2014-12 | 2015-01 | 2015-02 | 2015-03 | 2015-04 | 2015-05 | 2015-06 | 2015-07 | 2015-08 | 2015-09 | 2015-10 | 2015-11 | 2015-12 | 2016-01 | 2016-02 | 2016-03 | 2016-04 | 2016-05 | 2016-06 | 2016-07 | 2016-08 | 2016-09 | 2016-10 | 2016-11 | 2016-12 | 2017-01 | 2017-02 | 2017-03 | 2017-04 | 2017-05 | 2017-06 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 61639 | 10025 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 798600 | 798800 | 801500 | 804600 | 814900 | 828300 | 835700 | 849300 | 858100 | 854000 | 834800 | 821700 | 830300 | 853700 | 868300 | 875200 | 882200 | 892400 | 905000 | 924000 | 934400 | 932100 | 927500 | 923600 | 907900 | 890900 | 883400 | 896100 | 923900 | 952900 | 964600 | 972500 | 973800 | 973400 | 966500 | 966800 | 967100 | 974800 | 976800 | 976100 | 973700 | 974500 | 973200 | 966400 | 950400 | 933300 | 920900 | 909400 | 891400 | 873300 | 858800 | 850200 | 842800 | 834000 | 828800 | 821400 | 813900 | 813300 | 821500 | 831700 | 845100 | 854500 | 858900 | 859200 | 863500 | 876000 | 886100 | 890000 | 894200 | 901800 | 909500 | 913300 | 907400 | 900000 | 897700 | 896300 | 892300 | 890400 | 888600 | 891700 | 899500 | 904400 | 908200 | 914000 | 915100 | 912300 | 914000 | 921100 | 923300 | 917300 | 915000 | 922800 | 929100 | 937700 | 955700 | 974200 | 995500 | 1019500 | 1035100 | 1054900 | 1079900 | 1092600 | 1103500 | 1118800 | 1139300 | 1154600 | 1144100 | 1120300 | 1125500 | 1136000 | 1135100 | 1130000 | 1138200 | 1153700 | 1174800 | 1185400 | 1188400 | 1189700 | 1193700 | 1199900 | 1201400 | 1202600 | 1214200 | 1235200 | 1258000 | 1287700 | 1307200 | 1313900 | 1317100 | 1327400 | 1338800 | 1350400 | 1356600 | 1358500 | 1364000 | 1373300 | 1382600 | 1374400 | 1364100 | 1366300 | 1354800 | 1327500 | 1317300 | 1333700 | 1352100 | 1390000 | 1431000 |
| 61637 | 10023 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1526800 | 1424500 | 1346600 | 1331300 | 1322500 | 1289300 | 1265400 | 1249700 | 1241100 | 1232700 | 1225500 | 1228200 | 1252600 | 1266100 | 1288700 | 1308100 | 1333000 | 1356400 | 1362000 | 1353600 | 1364000 | 1373900 | 1389600 | 1401600 | 1404100 | 1415800 | 1432400 | 1455400 | 1474200 | 1462300 | 1438300 | 1435500 | 1427800 | 1411200 | 1407400 | 1419700 | 1457400 | 1500800 | 1524900 | 1537800 | 1558700 | 1586100 | 1602300 | 1621100 | 1639300 | 1657400 | 1657400 | 1656100 | 1649400 | 1643400 | 1632400 | 1618200 | 1588300 | 1543600 | 1500800 | 1464200 | 1426100 | 1387300 | 1362600 | 1351700 | 1344300 | 1331800 | 1334800 | 1314200 | 1271900 | 1252300 | 1262300 | 1279200 | 1309000 | 1335300 | 1353800 | 1366400 | 1372100 | 1381300 | 1385000 | 1388100 | 1399100 | 1399800 | 1389300 | 1384700 | 1380900 | 1367900 | 1365400 | 1375100 | 1380400 | 1377000 | 1375100 | 1379000 | 1395200 | 1414500 | 1419000 | 1403100 | 1383200 | 1376700 | 1378200 | 1378700 | 1375900 | 1366700 | 1365500 | 1382200 | 1404700 | 1428000 | 1445700 | 1452900 | 1460100 | 1484400 | 1508400 | 1522800 | 1538300 | 1568600 | 1597400 | 1622900 | 1654300 | 1684600 | 1713000 | 1728800 | 1736100 | 1745900 | 1753800 | 1736600 | 1730400 | 1734500 | 1728700 | 1720800 | 1717700 | 1700100 | 1680400 | 1676400 | 1685600 | 1708100 | 1730400 | 1751800 | 1778300 | 1810400 | 1831600 | 1844400 | 1861600 | 1889600 | 1901500 | 1895300 | 1890200 | 1898400 | 1924500 | 1967300 | 1993500 | 1980700 | 1960900 | 1951300 | 1937800 | 1929800 | 1955000 | 2022400 | 2095000 | 2142300 |
| 61703 | 10128 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 926300 | 940500 | 953700 | 966800 | 972300 | 978900 | 984700 | 984300 | 980400 | 993200 | 1016800 | 1025500 | 1018900 | 1024300 | 1042400 | 1060800 | 1063400 | 1049400 | 1029400 | 1017800 | 1020500 | 1033500 | 1049700 | 1067900 | 1086700 | 1100500 | 1115100 | 1142000 | 1178800 | 1196200 | 1200200 | 1218700 | 1243300 | 1241000 | 1226400 | 1218200 | 1200800 | 1188000 | 1179600 | 1157300 | 1118700 | 1085800 | 1050800 | 1019200 | 993700 | 981400 | 971200 | 961700 | 957500 | 959300 | 969400 | 985400 | 983700 | 983400 | 1007600 | 1023000 | 1010500 | 1006800 | 1019400 | 1026900 | 1023800 | 1014900 | 992800 | 974500 | 965200 | 963200 | 959200 | 964400 | 975700 | 988600 | 998000 | 1019700 | 1045500 | 1064200 | 1066000 | 1057500 | 1058600 | 1069600 | 1068700 | 1054500 | 1045000 | 1043400 | 1050300 | 1050500 | 1050700 | 1059700 | 1079600 | 1091600 | 1106100 | 1121700 | 1139900 | 1153100 | 1174400 | 1182500 | 1170800 | 1166000 | 1172700 | 1171800 | 1173000 | 1187000 | 1200200 | 1209600 | 1214800 | 1218800 | 1221200 | 1230500 | 1243500 | 1259000 | 1277400 | 1296300 | 1305600 | 1310800 | 1313400 | 1313500 | 1314500 | 1328000 | 1347900 | 1376100 | 1409500 | 1431400 | 1441600 | 1453100 | 1468100 | 1492000 | 1518100 | 1531300 | 1525300 | 1509000 | 1520400 | 1543900 | 1547400 | 1526000 | 1523700 | 1527200 | 1541600 | 1557800 | 1582900 | 1598900 | 1646100 | 1720500 | 1787100 |
| 61625 | 10011 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1384000 | 1325500 | 1297400 | 1313500 | 1332400 | 1314500 | 1296700 | 1295100 | 1280000 | 1255300 | 1250800 | 1261200 | 1291100 | 1333900 | 1358500 | 1382600 | 1438800 | 1473100 | 1444500 | 1407100 | 1400300 | 1401600 | 1400500 | 1393700 | 1396100 | 1413200 | 1428200 | 1442400 | 1459000 | 1466500 | 1462000 | 1464300 | 1483300 | 1514600 | 1541900 | 1563500 | 1591300 | 1609900 | 1630300 | 1639200 | 1626700 | 1618900 | 1635400 | 1642000 | 1648300 | 1671100 | 1681400 | 1691600 | 1706200 | 1711100 | 1714100 | 1733400 | 1737700 | 1719900 | 1678200 | 1628400 | 1578300 | 1528400 | 1489200 | 1467000 | 1449400 | 1432100 | 1422700 | 1402200 | 1378600 | 1372300 | 1393100 | 1430000 | 1462100 | 1476800 | 1479000 | 1476600 | 1461200 | 1448300 | 1441600 | 1444300 | 1438600 | 1434100 | 1439300 | 1437700 | 1430300 | 1426800 | 1427800 | 1424600 | 1432800 | 1456500 | 1485100 | 1500200 | 1509600 | 1518500 | 1530800 | 1538000 | 1530500 | 1524500 | 1546500 | 1574800 | 1599600 | 1622500 | 1639000 | 1656100 | 1684600 | 1703000 | 1710000 | 1734300 | 1765200 | 1786000 | 1810700 | 1841500 | 1867600 | 1882200 | 1897000 | 1917300 | 1963400 | 1999200 | 2003500 | 2007900 | 2027700 | 2043500 | 2056300 | 2064500 | 2066000 | 2057900 | 2031300 | 1999000 | 1979200 | 1982900 | 2001600 | 2014700 | 2023500 | 2055300 | 2078300 | 2083600 | 2088800 | 2110600 | 2127500 | 2168900 | 2204700 | 2216100 | 2212500 | 2222600 | 2231900 | 2250800 | 2285200 | 2329100 | 2354000 | 2355500 | 2352200 | 2332100 | 2313300 | 2319600 | 2342100 | 2365900 | 2419700 | 2480400 |
| 61617 | 10003 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1387000 | 1315300 | 1265900 | 1217100 | 1167500 | 1125800 | 1106100 | 1105900 | 1119200 | 1130100 | 1127600 | 1129900 | 1168000 | 1214000 | 1236900 | 1295200 | 1398900 | 1431700 | 1377700 | 1343100 | 1351900 | 1354000 | 1358200 | 1374800 | 1378500 | 1378400 | 1418300 | 1441000 | 1427900 | 1401900 | 1391800 | 1379800 | 1371400 | 1367700 | 1400900 | 1441100 | 1475000 | 1491500 | 1504700 | 1499200 | 1481900 | 1478800 | 1509000 | 1532700 | 1524300 | 1520600 | 1533200 | 1556600 | 1586000 | 1595200 | 1578900 | 1587000 | 1627200 | 1648100 | 1614400 | 1553000 | 1486000 | 1417000 | 1375100 | 1361300 | 1333400 | 1299700 | 1296500 | 1273900 | 1227800 | 1202600 | 1207100 | 1220100 | 1252400 | 1285700 | 1278300 | 1279100 | 1326700 | 1376500 | 1368900 | 1366000 | 1381300 | 1380700 | 1368500 | 1372700 | 1378000 | 1361700 | 1357800 | 1364400 | 1358000 | 1329800 | 1317800 | 1333200 | 1348500 | 1349500 | 1352200 | 1354100 | 1351900 | 1364200 | 1376600 | 1384200 | 1387900 | 1404200 | 1419200 | 1425700 | 1435300 | 1460300 | 1466500 | 1458100 | 1465500 | 1502300 | 1563900 | 1592000 | 1596200 | 1625200 | 1672300 | 1699500 | 1718500 | 1734300 | 1748600 | 1763700 | 1766700 | 1772200 | 1762700 | 1736700 | 1712400 | 1703700 | 1702500 | 1708800 | 1716300 | 1720500 | 1721800 | 1741800 | 1775800 | 1796500 | 1821500 | 1870100 | 1901000 | 1904900 | 1914000 | 1926400 | 1932200 | 1936700 | 1945200 | 1935600 | 1911200 | 1918700 | 1947600 | 1951300 | 1932800 | 1930400 | 1937500 | 1935100 | 1915700 | 1916500 | 1965700 | 2045300 | 2109100 | 2147000 |
| 62012 | 11201 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 600400 | 574900 | 564800 | 556600 | 545700 | 534000 | 521300 | 521300 | 537600 | 559100 | 580700 | 608000 | 637400 | 661300 | 682100 | 694800 | 698100 | 696100 | 700400 | 704200 | 702800 | 697900 | 693300 | 692800 | 694400 | 694700 | 692000 | 693100 | 704600 | 723000 | 736600 | 742800 | 747500 | 750300 | 753800 | 758900 | 764600 | 767300 | 769400 | 772400 | 779100 | 785600 | 786700 | 782000 | 782700 | 785800 | 784600 | 785100 | 795300 | 799300 | 798000 | 801800 | 808000 | 808200 | 805600 | 805100 | 813000 | 825500 | 835700 | 839100 | 836700 | 836900 | 843900 | 852300 | 857400 | 859900 | 858700 | 857100 | 856800 | 863400 | 875900 | 891000 | 909500 | 937200 | 965400 | 992200 | 1008700 | 1014300 | 1020800 | 1040000 | 1058000 | 1074200 | 1098300 | 1124500 | 1140900 | 1156900 | 1182000 | 1207600 | 1223800 | 1231600 | 1240500 | 1253600 | 1264500 | 1270500 | 1276300 | 1289600 | 1303800 | 1305300 | 1298900 | 1301000 | 1314200 | 1322800 | 1320500 | 1318800 | 1325600 | 1333000 | 1334800 | 1333100 | 1334600 | 1339000 | 1343000 | 1340200 | 1338700 | 1350600 | 1375600 | 1390200 | 1398100 | 1399900 | 1400500 | 1407300 | 1420700 |
1. Missing values
Median price for early years (1996-2013) has plenty of NAs as shown in the table below. Steps are taken in the following section to exclude columns with NAs when trend calculation and time series forecast.
cost_missing_val <- sapply(zillow_subset, function(x) sum(length(which(is.na(x)))))
kable(as.data.frame(cost_missing_val)) %>%
kableExtra::kable_styling(bootstrap_options = "striped") %>%
kableExtra::scroll_box(width = "100%", height = "250px")
| cost_missing_val | |
|---|---|
| RegionID | 0 |
| RegionName | 0 |
| 1996-04 | 17 |
| 1996-05 | 17 |
| 1996-06 | 17 |
| 1996-07 | 17 |
| 1996-08 | 17 |
| 1996-09 | 17 |
| 1996-10 | 17 |
| 1996-11 | 17 |
| 1996-12 | 17 |
| 1997-01 | 17 |
| 1997-02 | 17 |
| 1997-03 | 17 |
| 1997-04 | 17 |
| 1997-05 | 17 |
| 1997-06 | 17 |
| 1997-07 | 17 |
| 1997-08 | 17 |
| 1997-09 | 17 |
| 1997-10 | 17 |
| 1997-11 | 17 |
| 1997-12 | 17 |
| 1998-01 | 17 |
| 1998-02 | 17 |
| 1998-03 | 17 |
| 1998-04 | 17 |
| 1998-05 | 17 |
| 1998-06 | 17 |
| 1998-07 | 17 |
| 1998-08 | 17 |
| 1998-09 | 17 |
| 1998-10 | 16 |
| 1998-11 | 16 |
| 1998-12 | 16 |
| 1999-01 | 16 |
| 1999-02 | 16 |
| 1999-03 | 16 |
| 1999-04 | 16 |
| 1999-05 | 16 |
| 1999-06 | 16 |
| 1999-07 | 16 |
| 1999-08 | 16 |
| 1999-09 | 16 |
| 1999-10 | 16 |
| 1999-11 | 16 |
| 1999-12 | 16 |
| 2000-01 | 16 |
| 2000-02 | 16 |
| 2000-03 | 16 |
| 2000-04 | 16 |
| 2000-05 | 16 |
| 2000-06 | 16 |
| 2000-07 | 16 |
| 2000-08 | 16 |
| 2000-09 | 16 |
| 2000-10 | 16 |
| 2000-11 | 16 |
| 2000-12 | 16 |
| 2001-01 | 16 |
| 2001-02 | 16 |
| 2001-03 | 16 |
| 2001-04 | 16 |
| 2001-05 | 16 |
| 2001-06 | 16 |
| 2001-07 | 16 |
| 2001-08 | 16 |
| 2001-09 | 16 |
| 2001-10 | 16 |
| 2001-11 | 16 |
| 2001-12 | 16 |
| 2002-01 | 16 |
| 2002-02 | 17 |
| 2002-03 | 17 |
| 2002-04 | 17 |
| 2002-05 | 17 |
| 2002-06 | 17 |
| 2002-07 | 17 |
| 2002-08 | 17 |
| 2002-09 | 17 |
| 2002-10 | 17 |
| 2002-11 | 17 |
| 2002-12 | 17 |
| 2003-01 | 17 |
| 2003-02 | 17 |
| 2003-03 | 17 |
| 2003-04 | 17 |
| 2003-05 | 17 |
| 2003-06 | 17 |
| 2003-07 | 17 |
| 2003-08 | 17 |
| 2003-09 | 17 |
| 2003-10 | 17 |
| 2003-11 | 16 |
| 2003-12 | 14 |
| 2004-01 | 13 |
| 2004-02 | 13 |
| 2004-03 | 12 |
| 2004-04 | 12 |
| 2004-05 | 12 |
| 2004-06 | 10 |
| 2004-07 | 10 |
| 2004-08 | 10 |
| 2004-09 | 5 |
| 2004-10 | 4 |
| 2004-11 | 4 |
| 2004-12 | 4 |
| 2005-01 | 4 |
| 2005-02 | 4 |
| 2005-03 | 4 |
| 2005-04 | 3 |
| 2005-05 | 3 |
| 2005-06 | 3 |
| 2005-07 | 3 |
| 2005-08 | 3 |
| 2005-09 | 3 |
| 2005-10 | 1 |
| 2005-11 | 1 |
| 2005-12 | 1 |
| 2006-01 | 1 |
| 2006-02 | 1 |
| 2006-03 | 1 |
| 2006-04 | 1 |
| 2006-05 | 1 |
| 2006-06 | 1 |
| 2006-07 | 1 |
| 2006-08 | 1 |
| 2006-09 | 1 |
| 2006-10 | 1 |
| 2006-11 | 1 |
| 2006-12 | 1 |
| 2007-01 | 1 |
| 2007-02 | 1 |
| 2007-03 | 1 |
| 2007-04 | 1 |
| 2007-05 | 1 |
| 2007-06 | 0 |
| 2007-07 | 0 |
| 2007-08 | 0 |
| 2007-09 | 0 |
| 2007-10 | 0 |
| 2007-11 | 0 |
| 2007-12 | 0 |
| 2008-01 | 0 |
| 2008-02 | 0 |
| 2008-03 | 0 |
| 2008-04 | 0 |
| 2008-05 | 0 |
| 2008-06 | 0 |
| 2008-07 | 0 |
| 2008-08 | 0 |
| 2008-09 | 0 |
| 2008-10 | 0 |
| 2008-11 | 0 |
| 2008-12 | 0 |
| 2009-01 | 0 |
| 2009-02 | 0 |
| 2009-03 | 0 |
| 2009-04 | 0 |
| 2009-05 | 0 |
| 2009-06 | 0 |
| 2009-07 | 0 |
| 2009-08 | 0 |
| 2009-09 | 0 |
| 2009-10 | 0 |
| 2009-11 | 0 |
| 2009-12 | 0 |
| 2010-01 | 0 |
| 2010-02 | 0 |
| 2010-03 | 0 |
| 2010-04 | 0 |
| 2010-05 | 0 |
| 2010-06 | 0 |
| 2010-07 | 0 |
| 2010-08 | 0 |
| 2010-09 | 0 |
| 2010-10 | 0 |
| 2010-11 | 0 |
| 2010-12 | 0 |
| 2011-01 | 0 |
| 2011-02 | 0 |
| 2011-03 | 0 |
| 2011-04 | 0 |
| 2011-05 | 0 |
| 2011-06 | 0 |
| 2011-07 | 0 |
| 2011-08 | 0 |
| 2011-09 | 0 |
| 2011-10 | 0 |
| 2011-11 | 0 |
| 2011-12 | 0 |
| 2012-01 | 0 |
| 2012-02 | 0 |
| 2012-03 | 0 |
| 2012-04 | 0 |
| 2012-05 | 0 |
| 2012-06 | 0 |
| 2012-07 | 0 |
| 2012-08 | 0 |
| 2012-09 | 0 |
| 2012-10 | 0 |
| 2012-11 | 0 |
| 2012-12 | 0 |
| 2013-01 | 0 |
| 2013-02 | 0 |
| 2013-03 | 0 |
| 2013-04 | 0 |
| 2013-05 | 0 |
| 2013-06 | 0 |
| 2013-07 | 0 |
| 2013-08 | 0 |
| 2013-09 | 0 |
| 2013-10 | 0 |
| 2013-11 | 0 |
| 2013-12 | 0 |
| 2014-01 | 0 |
| 2014-02 | 0 |
| 2014-03 | 0 |
| 2014-04 | 0 |
| 2014-05 | 0 |
| 2014-06 | 0 |
| 2014-07 | 0 |
| 2014-08 | 0 |
| 2014-09 | 0 |
| 2014-10 | 0 |
| 2014-11 | 0 |
| 2014-12 | 0 |
| 2015-01 | 0 |
| 2015-02 | 0 |
| 2015-03 | 0 |
| 2015-04 | 0 |
| 2015-05 | 0 |
| 2015-06 | 0 |
| 2015-07 | 0 |
| 2015-08 | 0 |
| 2015-09 | 0 |
| 2015-10 | 0 |
| 2015-11 | 0 |
| 2015-12 | 0 |
| 2016-01 | 0 |
| 2016-02 | 0 |
| 2016-03 | 0 |
| 2016-04 | 0 |
| 2016-05 | 0 |
| 2016-06 | 0 |
| 2016-07 | 0 |
| 2016-08 | 0 |
| 2016-09 | 0 |
| 2016-10 | 0 |
| 2016-11 | 0 |
| 2016-12 | 0 |
| 2017-01 | 0 |
| 2017-02 | 0 |
| 2017-03 | 0 |
| 2017-04 | 0 |
| 2017-05 | 0 |
| 2017-06 | 0 |
For future analysis, we filter the rows and select the relevant columns to keep the values of properties in New York City at the scraped date.
2. Missing zipcodes of NYC
New York hosts 176 zipcodes. However, there are only 25 zipcodes of NYC recorded in this data set. We are going to settle with this data for now. New data source can be connected in the future to account for rest of the zipcodes.
Revenue data contains information including details about the properties including location, number of bedrooms, room types, price and other details for stay.
kable(head(airbnb)) %>%
kableExtra::kable_styling(bootstrap_options ="striped") %>%
kableExtra::scroll_box(width = "100%", height = "250px")
| id | listing_url | scrape_id | last_scraped | name | summary | space | description | experiences_offered | neighborhood_overview | notes | transit | access | interaction | house_rules | thumbnail_url | medium_url | picture_url | xl_picture_url | host_id | host_url | host_name | host_since | host_location | host_about | host_response_time | host_response_rate | host_acceptance_rate | host_is_superhost | host_thumbnail_url | host_picture_url | host_neighbourhood | host_listings_count | host_total_listings_count | host_verifications | host_has_profile_pic | host_identity_verified | street | neighbourhood | neighbourhood_cleansed | neighbourhood_group_cleansed | city | state | zipcode | market | smart_location | country_code | country | latitude | longitude | is_location_exact | property_type | room_type | accommodates | bathrooms | bedrooms | beds | bed_type | amenities | square_feet | price | weekly_price | monthly_price | security_deposit | cleaning_fee | guests_included | extra_people | minimum_nights | maximum_nights | minimum_minimum_nights | maximum_minimum_nights | minimum_maximum_nights | maximum_maximum_nights | minimum_nights_avg_ntm | maximum_nights_avg_ntm | calendar_updated | has_availability | availability_30 | availability_60 | availability_90 | availability_365 | calendar_last_scraped | number_of_reviews | number_of_reviews_ltm | first_review | last_review | review_scores_rating | review_scores_accuracy | review_scores_cleanliness | review_scores_checkin | review_scores_communication | review_scores_location | review_scores_value | requires_license | license | jurisdiction_names | instant_bookable | is_business_travel_ready | cancellation_policy | require_guest_profile_picture | require_guest_phone_verification | calculated_host_listings_count | calculated_host_listings_count_entire_homes | calculated_host_listings_count_private_rooms | calculated_host_listings_count_shared_rooms | reviews_per_month |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2539 | https://www.airbnb.com/rooms/2539 | 2.019071e+13 | 2019-07-09 | Clean & quiet apt home by the park | Renovated apt home in elevator building. | Spacious, renovated, and clean apt home, one block to F train, 25 minutes to lower Manhatten | Renovated apt home in elevator building. Spacious, renovated, and clean apt home, one block to F train, 25 minutes to lower Manhatten Close to Prospect Park and Historic Ditmas Park Very close to F and G trains and Express bus into NY. The B and Q are closeby also. If this room is unavailable on your desired dates, check out our other rooms, such as: https://www.airbnb.com/rooms/10267242 | none | Close to Prospect Park and Historic Ditmas Park | If this room is unavailable on your desired dates, check out our other rooms, such as: https://www.airbnb.com/rooms/10267242 | Very close to F and G trains and Express bus into NY. The B and Q are closeby also. | NA | NA | -The security and comfort of all our guests is important to us! Therefore, no one is permitted to check in without first emailing a clear government issued photo ID, acceptable to manager. Instructions will be provided in the house manual. -No eating, drinking or storage of food in the room. -No smoking. -Illicit drug use is strictly forbidden. -Quiet hours after 10pm and before 8am, This is respectful for our neighbors and other guests. -Please clean up after yourself when using the kitchen and bath. -Please lock the doors and close the windows when exiting the home. -Please indicate and pay for the correct number of guests. Failure to do so will cancel your reservation with no refund due. -Please remove your shoes when entering the apt home. Thanks for choosing our home! | NA | NA | https://a0.muscache.com/im/pictures/3949d073-a02e-4ebc-aa9c-ac74f00eaa1f.jpg?aki_policy=large | NA | 2787 | https://www.airbnb.com/users/show/2787 | John | 2008-09-07 | New York, New York, United States | Educated professional living in Brooklyn. I love meeting new people, running, hiking, fine foods, traveling, etc. One of my favorite trips was spending New Year’s Eve in London on the Thames River. Big Ben, spectacular fireworks and light show; and fun times with a good crowd of international tourists. A most memorable night and trip! Also, I generally approach life with a positive attitude. I look forward to meeting you. | within an hour | 100% | N/A | FALSE | https://a0.muscache.com/im/pictures/8674565a-758d-476b-a580-4d99ea9baab9.jpg?aki_policy=profile_small | https://a0.muscache.com/im/pictures/8674565a-758d-476b-a580-4d99ea9baab9.jpg?aki_policy=profile_x_medium | Gravesend | 6 | 6 | [‘email’, ‘phone’, ‘reviews’, ‘kba’] | TRUE | TRUE | Brooklyn , NY, United States | Brooklyn | Kensington | Brooklyn | Brooklyn | NY | 11218 | New York | Brooklyn , NY | US | United States | 40.64749 | -73.97237 | FALSE | Apartment | Private room | 2 | 1 | 1 | 1 | Real Bed | {TV,“Cable TV”,Internet,Wifi,“Wheelchair accessible”,Kitchen,“Free parking on premises”,Elevator,“Free street parking”,“Buzzer/wireless intercom”,Heating,“Suitable for events”,Washer,Dryer,“Smoke detector”,“Carbon monoxide detector”,“First aid kit”,“Safety card”,“Fire extinguisher”,Essentials,Shampoo,“24-hour check-in”,Hangers,“Hair dryer”,Iron,“Laptop friendly workspace”,“translation missing: en.hosting_amenity_49”,“translation missing: en.hosting_amenity_50”,“Self check-in”,Keypad,“Outlet covers”,“Hot water”,“Bed linens”,“Extra pillows and blankets”,Microwave,“Coffee maker”,Refrigerator,“Dishes and silverware”,“Cooking basics”,Oven,Stove,“Luggage dropoff allowed”,“Long term stays allowed”,“Cleaning before checkout”} | NA | $149.00 | $299.00 | $999.00 | $100.00 | $25.00 | 1 | $35.00 | 1 | 730 | 1 | 1 | 730 | 730 | 1 | 730 | 3 weeks ago | TRUE | 30 | 60 | 90 | 365 | 2019-07-09 | 9 | 2 | 2015-12-04 | 2018-10-19 | 98 | 10 | 10 | 10 | 10 | 10 | 10 | FALSE | NA | NA | FALSE | FALSE | moderate | FALSE | FALSE | 6 | 0 | 5 | 1 | 0.21 |
| 2595 | https://www.airbnb.com/rooms/2595 | 2.019071e+13 | 2019-07-09 | Skylit Midtown Castle | Find your romantic getaway to this beautiful, spacious skylit studio in the heart of Midtown, Manhattan. STUNNING SKYLIT STUDIO / 1 BED + SINGLE / FULL BATH / FULL KITCHEN / FIREPLACE / CENTRALLY LOCATED / WiFi + APPLE TV / SHEETS + TOWELS |
|
Find your romantic getaway to this beautiful, spacious skylit studio in the heart of Midtown, Manhattan. STUNNING SKYLIT STUDIO / 1 BED + SINGLE / FULL BATH / FULL KITCHEN / FIREPLACE / CENTRALLY LOCATED / WiFi + APPLE TV / SHEETS + TOWELS - Spacious (500+ft²), immaculate and nicely furnished & designed studio. - Tuck yourself into the ultra comfortable bed under the skylight. Fall in love with a myriad of bright lights in the city night sky. - Single-sized bed/convertible floor mattress with luxury bedding (available upon request). - Gorgeous pyramid skylight with amazing diffused natural light, stunning architectural details, soaring high vaulted ceilings, exposed brick, wood burning fireplace, floor seating area with natural zafu cushions, modern style mixed with eclectic art & antique treasures, large full bath, newly renovated kitchen, air conditioning/heat, high speed WiFi Internet, and Apple TV. - Centrally located in the heart of Midtown Manhattan just a few blocks from all s | none | Centrally located in the heart of Manhattan just a few blocks from all subway connections in the very desirable Midtown location a few minutes walk to Times Square, the Theater District, Bryant Park and Herald Square. | NA | Apartment is located on 37th Street between 5th & 6th Avenue, just a few blocks from all subway connections. Closest Subways (in order of proximity to apartment (Website hidden by Airbnb) W: 34th Street & 6th Avenu (Website hidden by Airbnb) 3: 34th Street & 7th Avenue 7: 42nd & 5th Avenu (Website hidden by Airbnb) S: 42nd Street between Park & Lexington Avenue (Website hidden by Airbnb) E: 34th Street and 8th Avenue If coming by car, there is a parking garage on the block and free street parking. | Guests have full access to the kitchen, bathroom and living spaces. The closets are private/off limits. | I am a Sound Therapy Practitioner and Kundalini Yoga & Meditation teacher. I work with energy and sound for relaxation and healing, using Symphonic gong, singing bowls, tuning forks, drums, voice and other instruments. Sound relaxation sessions and/or personalized Kundalini Yoga sessions are available in the space upon request. Individual, couples or group sessions available. Licensed acupuncture and massage also available upon request. Please inquire. I welcome my guests at the apartment for check-in, or alternatively, a self check-in can be arranged. Once you are settled in, I am just a phone call, text or email away, should you have any questions, concerns or issues during your stay. My desire is that you have a smooth arrival and amazing stay here. | Make yourself at home, respect the space and the neighbors. No pets, no smoking and no unauthorized guests. | NA | NA | https://a0.muscache.com/im/pictures/f0813a11-40b2-489e-8217-89a2e1637830.jpg?aki_policy=large | NA | 2845 | https://www.airbnb.com/users/show/2845 | Jennifer | 2008-09-09 | New York, New York, United States | A New Yorker since 2000! My passion is creating beautiful, unique spaces where unforgettable memories are made. It’s my pleasure to host people from around the world and meet new faces. Welcome travelers! I am a Sound Therapy Practitioner and Kundalini Yoga & Meditation teacher. I work with energy and sound for relaxation and healing, using Symphonic gong, singing bowls, tuning forks, drums, voice and other instruments. | within a few hours | 87% | N/A | FALSE | https://a0.muscache.com/im/users/2845/profile_pic/1259095067/original.jpg?aki_policy=profile_small | https://a0.muscache.com/im/users/2845/profile_pic/1259095067/original.jpg?aki_policy=profile_x_medium | Midtown | 5 | 5 | [‘email’, ‘phone’, ‘reviews’, ‘kba’, ‘work_email’] | TRUE | TRUE | New York, NY, United States | Manhattan | Midtown | Manhattan | New York | NY | 10018 | New York | New York, NY | US | United States | 40.75362 | -73.98377 | FALSE | Apartment | Entire home/apt | 2 | 1 | 0 | 1 | Real Bed | {TV,Wifi,“Air conditioning”,Kitchen,“Paid parking off premises”,“Free street parking”,“Indoor fireplace”,Heating,“Family/kid friendly”,“Smoke detector”,“Carbon monoxide detector”,“Fire extinguisher”,Essentials,Shampoo,“Lock on bedroom door”,Hangers,“Hair dryer”,Iron,“Laptop friendly workspace”,“Self check-in”,Keypad,“Private living room”,Bathtub,“Hot water”,“Bed linens”,“Extra pillows and blankets”,“Ethernet connection”,“Coffee maker”,Refrigerator,“Dishes and silverware”,“Cooking basics”,Oven,Stove,“Luggage dropoff allowed”,“Long term stays allowed”,“Cleaning before checkout”,“Wide entrance for guests”,“Flat path to guest entrance”,“Well-lit path to entrance”,“No stairs or steps to enter”} | NA | $225.00 | $1,995.00 | NA | $350.00 | $100.00 | 2 | $0.00 | 1 | 1125 | 1 | 1 | 1125 | 1125 | 1 | 1125 | 4 days ago | TRUE | 25 | 55 | 80 | 355 | 2019-07-09 | 45 | 11 | 2009-11-21 | 2019-05-21 | 95 | 10 | 9 | 10 | 10 | 10 | 9 | FALSE | NA | NA | FALSE | FALSE | strict_14_with_grace_period | TRUE | TRUE | 2 | 1 | 0 | 1 | 0.38 |
| 3647 | https://www.airbnb.com/rooms/3647 | 2.019071e+13 | 2019-07-08 | THE VILLAGE OF HARLEM….NEW YORK ! | NA | WELCOME TO OUR INTERNATIONAL URBAN COMMUNITY This Spacious 1 bedroom is with Plenty of Windows with a View……. Sleeps…..Four Adults…..two in the Livingrm. with (2) Sofa-beds. (Website hidden by Airbnb) two in the Bedrm.on a very Comfortable Queen Size Bed… A Complete Bathrm…..With Shower and Bathtub……. Fully Equipped with Linens & Towels…….. Spacious Living Room……Flat ScreenTelevision…..DVD Player with Movies available for your viewing during your stay………………………………………………………………….. Dining Area…..for Morning Coffee or Tea…………………………………………….. The Kitchen Area is Modern with Granite Counter Top… includes the use of a Coffee Maker…Microwave to Heat up a Carry Out/In Meal…. Not suited for a Gourmet Cook…or Top Chef……Sorry!!!! . This Flat is located in HISTORIC HARLEM…. near the Appollo Theater and The Museum Mile…on Fifth Avenue. Sylvia’s World Famous Resturant…loca | WELCOME TO OUR INTERNATIONAL URBAN COMMUNITY This Spacious 1 bedroom is with Plenty of Windows with a View……. Sleeps…..Four Adults…..two in the Livingrm. with (2) Sofa-beds. (Website hidden by Airbnb) two in the Bedrm.on a very Comfortable Queen Size Bed… A Complete Bathrm…..With Shower and Bathtub……. Fully Equipped with Linens & Towels…….. Spacious Living Room……Flat ScreenTelevision…..DVD Player with Movies available for your viewing during your stay………………………………………………………………….. Dining Area…..for Morning Coffee or Tea…………………………………………….. The Kitchen Area is Modern with Granite Counter Top… includes the use of a Coffee Maker…Microwave to Heat up a Carry Out/In Meal…. Not suited for a Gourmet Cook…or Top Chef……Sorry!!!! . This Flat is located in HISTORIC HARLEM…. near the Appollo Theater and The Museum Mile…on Fifth Avenue. Sylvia’s World Famous Resturant…loca | none | NA | NA | NA | NA | NA | Upon arrival please have a legibile copy of your Passport and / or State Photo. ID. as well as your confirmation letter. Please NO SMOKING …LOUD TALKING or PARTIES of any kind. Security Deposit and Cleaning Fees in CASH at time of arrival. Security deposit will be refunded within 72hrs pending no damages. .Cleaning fees are non-refundable. At Check Out… Please dispose of all trash/garbage in the bins located outside behind the entrance stairs. Please place all dirty linens and towels in the dirty laundry bin. Please don’t leave any dishes in the sink and dispose of all Plastic Dishes/Plastic Culinary. If you need a late check-out please contact the Emergency Contact Telephone Numbers that are listed in the Flat. PLEASE LEAVE ALL KEYS IN THE FLAT and BE CERTAIN THE DOORS ARE LOCKED. WE HOPE YOU ENJOYED YOUR STAY and will write positve comments and will visit again. | NA | NA | https://a0.muscache.com/im/pictures/838341/9b3c66f3_original.jpg?aki_policy=large | NA | 4632 | https://www.airbnb.com/users/show/4632 | Elisabeth | 2008-11-25 | New York, New York, United States | Make Up Artist National/ (Website hidden by Airbnb) Production. I m curently working with a Production Company in WDC and Coordinated a “Day Of Service” for the Presidential Innaugration 2013. I can’t live without Starbucks and Oprah’s Chai Tea Latte…and my circle of of great friends world wide. “BLESS ME INTO USEFULLNESS” is my daily prayer. I | within a day | 100% | N/A | FALSE | https://a0.muscache.com/im/users/4632/profile_pic/1328402497/original.jpg?aki_policy=profile_small | https://a0.muscache.com/im/users/4632/profile_pic/1328402497/original.jpg?aki_policy=profile_x_medium | Harlem | 1 | 1 | [‘email’, ‘phone’, ‘google’, ‘reviews’, ‘jumio’, ‘government_id’] | TRUE | TRUE | New York, NY, United States | Harlem | Harlem | Manhattan | New York | NY | 10027 | New York | New York, NY | US | United States | 40.80902 | -73.94190 | TRUE | Apartment | Private room | 2 | 1 | 1 | 1 | Pull-out Sofa | {“Cable TV”,Internet,Wifi,“Air conditioning”,Kitchen,“Buzzer/wireless intercom”,Heating,“Smoke detector”,“Carbon monoxide detector”,“translation missing: en.hosting_amenity_49”,“translation missing: en.hosting_amenity_50”} | NA | $150.00 | NA | NA | $200.00 | $75.00 | 2 | $20.00 | 3 | 7 | 3 | 3 | 7 | 7 | 3 | 7 | 34 months ago | TRUE | 30 | 60 | 90 | 365 | 2019-07-08 | 0 | 0 | NA | NA | NA | NA | NA | NA | NA | NA | NA | FALSE | NA | NA | FALSE | FALSE | strict_14_with_grace_period | TRUE | TRUE | 1 | 0 | 1 | 0 | NA |
| 3831 | https://www.airbnb.com/rooms/3831 | 2.019071e+13 | 2019-07-09 | Cozy Entire Floor of Brownstone | Urban retreat: enjoy 500 s.f. floor in 1899 brownstone, with wood and ceramic flooring throughout (completed Aug. 2015 through Sept. 2015), roomy bdrm, & upgraded kitchen & bathroom (completed Oct. 2015). It’s sunny and loaded with everything you need! | Greetings! We own a double-duplex brownstone in Clinton Hill on Gates near Classon Avenue - (7 blocks to C train, 5 blocks to G train, minutes to all), in which we host on the entire top floor of the upper duplex. This is more of an efficiency set-up: it is the top floor on a two-family, double duplex brownstone. The top floor for our guests consists of a sizable bedroom, full bath and eat-in kitchen for your exclusive use. Our family occupies the floors below. You go through a common hallway and staircase, to get to the top floor (2 easy flights up from the main entrance), but not through any rooms, so it is a fairly private set-up. - Clinton Hill, Gates Avenue near Classon Ave. (1 mi. or less to Williamsburg, Park Slope, Prospect Heights, downtown, Ft. Greene, Bed-Stuy, Bushwick; 20 mins to Manhattan) - includes FiOS, heat (or A/C), hot water, and electricity all included - furnished with two twin beds (convertible into a king bed), one rollaway twin bed and one inflatable | Urban retreat: enjoy 500 s.f. floor in 1899 brownstone, with wood and ceramic flooring throughout (completed Aug. 2015 through Sept. 2015), roomy bdrm, & upgraded kitchen & bathroom (completed Oct. 2015). It’s sunny and loaded with everything you need! Greetings! We own a double-duplex brownstone in Clinton Hill on Gates near Classon Avenue - (7 blocks to C train, 5 blocks to G train, minutes to all), in which we host on the entire top floor of the upper duplex. This is more of an efficiency set-up: it is the top floor on a two-family, double duplex brownstone. The top floor for our guests consists of a sizable bedroom, full bath and eat-in kitchen for your exclusive use. Our family occupies the floors below. You go through a common hallway and staircase, to get to the top floor (2 easy flights up from the main entrance), but not through any rooms, so it is a fairly private set-up. - Clinton Hill, Gates Avenue near Classon Ave. (1 mi. or less to Williamsburg, Park Slope, Pro | none | Just the right mix of urban center and local neighborhood; close to all but enough quiet for a calming walk. | NA | B52 bus for a 10-minute ride to downtown Brooklyn is a few yards away on the corner; G train/Classon Avenue is 5 blocks away; C train is about 6 blocks to either the Clinton/Washington stop or Franklin Avenue stop. There is on-street parking, alternate side is twice per week on the immediate block but only once per week on Classon. From LaGuardia Airport, a taxi will cost $30-$35, but there is also a bus that will put you at the Jackson Heights subway station, and from there it’s about 5 stops to catch the G train, which stops 5 blocks away. From JFK, the taxi is closer to $40, but the AirTran can get you conveniently to the A/C line and the C train is about 6 blocks from here. From JFK via subway/metro/train: From JFK take the AirTrain to Howard Beach to catch the A train toward Brooklyn/Manhattan. Take the A train to Utica Avenue and go across that same platform to catch the C local train (you could also transfer at Nostrand but you would have to carry luggage downstairs to cat | You will have exclusive use of and access to: a sizable private room as described in “The Space” section, furnished with two twin beds (which we will combine into one king bed upon request) and optional rollaway twin and/or inflatable beds, and other small furnishings; full private bath and private eat-in kitchen both renovated in Fall 2015; sizable dining table in sun-filled kitchen area doubles as a great desk space; alcove perfect for vertical bike storage. Upon request you may also have some use of the livingroom on the floor just below. | We’ll be around, but since you have the top floor to yourself, most of the interaction is on the way in or out - we’re open to socializing and did so frequently with our last long-term guests, so it’s really up to you | Smoking - outside please; pets allowed but please contact me first for arrangements | NA | NA | https://a0.muscache.com/im/pictures/e49999c2-9fd5-4ad5-b7cc-224deac989aa.jpg?aki_policy=large | NA | 4869 | https://www.airbnb.com/users/show/4869 | LisaRoxanne | 2008-12-07 | New York, New York, United States | Laid-back bi-coastal actor/professor/attorney. | within a few hours | 93% | N/A | FALSE | https://a0.muscache.com/im/users/4869/profile_pic/1371927771/original.jpg?aki_policy=profile_small | https://a0.muscache.com/im/users/4869/profile_pic/1371927771/original.jpg?aki_policy=profile_x_medium | Clinton Hill | 1 | 1 | [‘email’, ‘phone’, ‘reviews’, ‘kba’] | TRUE | TRUE | Brooklyn, NY, United States | Brooklyn | Clinton Hill | Brooklyn | Brooklyn | NY | 11238 | New York | Brooklyn, NY | US | United States | 40.68514 | -73.95976 | TRUE | Guest suite | Entire home/apt | 3 | 1 | 1 | 4 | Real Bed | {TV,“Cable TV”,Internet,Wifi,“Air conditioning”,Kitchen,“Pets allowed”,“Free street parking”,Heating,“Family/kid friendly”,“Smoke detector”,“Carbon monoxide detector”,“Fire extinguisher”,Essentials,Shampoo,“Lock on bedroom door”,“24-hour check-in”,Hangers,“Hair dryer”,Iron,“Laptop friendly workspace”,“Self check-in”,Lockbox,Bathtub,“High chair”,“Stair gates”,“Children’s books and toys”,“Pack ’n Play/travel crib”,“Hot water”,“Luggage dropoff allowed”,“Long term stays allowed”} | 500 | $89.00 | $575.00 | $2,100.00 | $500.00 | NA | 1 | $0.00 | 1 | 730 | 1 | 1 | 730 | 730 | 1 | 730 | today | TRUE | 0 | 0 | 3 | 194 | 2019-07-09 | 270 | 69 | 2014-09-30 | 2019-07-05 | 90 | 10 | 9 | 10 | 10 | 10 | 9 | FALSE | NA | NA | FALSE | FALSE | moderate | FALSE | FALSE | 1 | 1 | 0 | 0 | 4.64 |
| 5022 | https://www.airbnb.com/rooms/5022 | 2.019071e+13 | 2019-07-08 | Entire Apt: Spacious Studio/Loft by central park | NA | Loft apartment with high ceiling and wood flooring located 10 minutes away from Central Park in Harlem - 1 block away from 6 train and 3 blocks from 2 & 3 line. This is in a recently renovated building which includes elevator, trash shoot. marble entrance and laundromat in the basement. The apartment is a spacious loft studio. The seating area and sleeping area is divided by a bookcase. There is a long hallway entrance where the bathroom and closet for your clothes is situated. The apartment is in mint condition, the walls have been freshly painted a few months ago. Supermarket, and 24 hour convenience store less than 1 block away. 1 block away from Hot Yoga Studio and NY Sports club facility. Perfect for anyone wanting to stay in Manhattan but get more space. 10 minutes away from midtown and 15 minutes away from downtown. The neighborhood is lively and diverse. You will need to travel at least 10 blocks to find cafe’s, restaurants etc.. There are a few restaurants on 100 street on | Loft apartment with high ceiling and wood flooring located 10 minutes away from Central Park in Harlem - 1 block away from 6 train and 3 blocks from 2 & 3 line. This is in a recently renovated building which includes elevator, trash shoot. marble entrance and laundromat in the basement. The apartment is a spacious loft studio. The seating area and sleeping area is divided by a bookcase. There is a long hallway entrance where the bathroom and closet for your clothes is situated. The apartment is in mint condition, the walls have been freshly painted a few months ago. Supermarket, and 24 hour convenience store less than 1 block away. 1 block away from Hot Yoga Studio and NY Sports club facility. Perfect for anyone wanting to stay in Manhattan but get more space. 10 minutes away from midtown and 15 minutes away from downtown. The neighborhood is lively and diverse. You will need to travel at least 10 blocks to find cafe’s, restaurants etc.. There are a few restaurants on 100 street on | none | NA | NA | NA | NA | NA | Please be considerate when staying in the apartment. This is a low key building and it’s important guest are respectful. You can come and go as you please I just ask that you keep a low profile. 1) Please be respectful of neighbors - no loud music after 10pm and keep a low profile 2) Do not open the door for anyone 3) Please keep the apt clean 4) No access to mailbox - please forward personal mail to job or school | NA | NA | https://a0.muscache.com/im/pictures/feb453bd-fdec-405c-8bfa-3f6963d827e9.jpg?aki_policy=large | NA | 7192 | https://www.airbnb.com/users/show/7192 | Laura | 2009-01-29 | Miami, Florida, United States | I have been a NYer for almost 10 years. I came to NY to study and never left. I work in the advertising industry and love to eat peanut butter & jelly sandwiches. | N/A | N/A | N/A | FALSE | https://a0.muscache.com/im/users/7192/profile_pic/1325651676/original.jpg?aki_policy=profile_small | https://a0.muscache.com/im/users/7192/profile_pic/1325651676/original.jpg?aki_policy=profile_x_medium | East Harlem | 1 | 1 | [‘email’, ‘phone’, ‘facebook’, ‘reviews’, ‘kba’] | TRUE | TRUE | New York, NY, United States | East Harlem | East Harlem | Manhattan | New York | NY | 10029 | New York | New York, NY | US | United States | 40.79851 | -73.94399 | TRUE | Apartment | Entire home/apt | 1 | 1 | NA | 1 | Real Bed | {Internet,Wifi,“Air conditioning”,Kitchen,Elevator,“Free street parking”,“Buzzer/wireless intercom”,Heating,Washer,Dryer,“Smoke detector”,“Carbon monoxide detector”,Essentials,Shampoo,“Hair dryer”,“Hot water”,“Host greets you”} | NA | $80.00 | $600.00 | $1,600.00 | $100.00 | $80.00 | 1 | $20.00 | 10 | 120 | 10 | 10 | 120 | 120 | 10 | 120 | 3 months ago | TRUE | 0 | 0 | 0 | 0 | 2019-07-08 | 9 | 4 | 2012-03-20 | 2018-11-19 | 93 | 10 | 9 | 10 | 10 | 9 | 10 | FALSE | NA | NA | FALSE | FALSE | strict_14_with_grace_period | TRUE | TRUE | 1 | 1 | 0 | 0 | 0.10 |
| 5099 | https://www.airbnb.com/rooms/5099 | 2.019071e+13 | 2019-07-08 | Large Cozy 1 BR Apartment In Midtown East | My large 1 bedroom apartment is true New York City living. The apt is in midtown on the east side and centrally located, just a 10-minute walk from Grand Central Station, Empire State Building, Times Square. The kitchen and living room are large and bright with Apple TV. I have a new Queen Bed that sleeps 2 people, and a Queen Aero Bed that can sleep 2 people in the living room. The apartment is located on the 5th floor of a walk up - no elevator (lift). | I have a large 1 bedroom apartment centrally located in Midtown East. A 10 minute walk from Grand Central Station, Times Square, Empire State Building and all major subway and bus lines. The apartment is located on the 5th floor of a pre-war walk up building-no elevator/lift. The apartment is bright with has high ceilings and flow through rooms. A spacious, cozy living room with Netflix and Apple TV. A large bright kitchen to sit and enjoy coffee or tea. The bedroom is spacious with a comfortable queen size bed that sleeps 2. I have a comfortable queen size aero bed that fits in the living room and sleeps 2. It can be tucked away for living space and opened when ready for bed. I’d be happy to give you tips and advice on the best ways to experience the most of NYC. The apartment’s location is great for sightseeing. ** Check out my listing guidebook ** If you would like to stay local in the area, there is a very long & famous strip of bars and restaurants along 3rd Avenue, which | My large 1 bedroom apartment is true New York City living. The apt is in midtown on the east side and centrally located, just a 10-minute walk from Grand Central Station, Empire State Building, Times Square. The kitchen and living room are large and bright with Apple TV. I have a new Queen Bed that sleeps 2 people, and a Queen Aero Bed that can sleep 2 people in the living room. The apartment is located on the 5th floor of a walk up - no elevator (lift). I have a large 1 bedroom apartment centrally located in Midtown East. A 10 minute walk from Grand Central Station, Times Square, Empire State Building and all major subway and bus lines. The apartment is located on the 5th floor of a pre-war walk up building-no elevator/lift. The apartment is bright with has high ceilings and flow through rooms. A spacious, cozy living room with Netflix and Apple TV. A large bright kitchen to sit and enjoy coffee or tea. The bedroom is spacious with a comfortable queen size bed that sleeps 2. I | none | My neighborhood in Midtown East is called Murray Hill. The area is very centrally located with easy access to explore . The apartment is about 5 blocks (7 minute walk) to the United Nations and Grand Central Station the main and most historic train station. Grand Central will give you access to every train in the city. The apartment is also very close to main attractions, It’s about a 10 minute walk to both the Empire State Building and Times Square. There’s a great shopping area with dozens of stores including H&M, Zara, The Gap, BeBe and the world famous Macy’s department store. These shops are a 10 minute walk up East 34th Street from 5th avenue and 8th avenue. If you would like to stay local in the area, there is a very long & famous strip of bars and restaurants along 3rd avenue, which is just around the corner from the apartment. It’s commonly known as the 3rd avenue strip. | Read My Full Listing For All Information. New York City really is the city that doesn’t sleep. There’s a constant flow of people, bikes and cars. The city can be noisy at times, if you’re a light sleeper, ear plugs would help. Check out my local guide book for things to do. | From the apartment is a 10 minute walk to Grand Central Station on East 42nd Street, a 10 minute walk to the Empire State Building on East 34th Street and 5th Avenue, a 10 minute walk to Times Square on West 42 Street and about 20 minutes walk to Central Park on 59th Street. Depending on how long you are staying, I would recommend a 7 day unlimited metro card. This allows you to travel unlimited all day and night on any train or bus in and outside of the city. Grand Central Station is the main NYC train station. You can find any train connection and get anywhere in Manhattan from Grand Central Station. You can get to Brooklyn, Queens, The Bronx and Staten Island from Grand Central Station The M15 bus is around the corner on 2nd avenue. This bus will take you from uptown Harlem to the East Village and South Street Seaport. It will also take you to the Staten Island ferry and Statue of Liberty and everywhere in between along the east side. The M101 bus is just up the street on 3rd aven | I will meet you upon arrival. | I usually check in with guests via text or email. I’m available by text, email or phone call with any questions, suggestions or to help out. | • Check-in time is 2PM. • Check-out time is 12 PM. Please be respectful of the space and leave the apartment in the condition you were welcomed into. | NA | NA | https://a0.muscache.com/im/pictures/0790b1a5-8981-41cc-a370-fa2b982a8803.jpg?aki_policy=large | NA | 7322 | https://www.airbnb.com/users/show/7322 | Chris | 2009-02-02 | New York, New York, United States | I’m an artist, writer, traveler, and a native new yorker. Welcome to my city. | within an hour | 100% | N/A | FALSE | https://a0.muscache.com/im/pictures/user/26745d24-d818-4bf5-8f9e-26b097121ba7.jpg?aki_policy=profile_small | https://a0.muscache.com/im/pictures/user/26745d24-d818-4bf5-8f9e-26b097121ba7.jpg?aki_policy=profile_x_medium | Flatiron District | 1 | 1 | [‘email’, ‘phone’, ‘reviews’, ‘jumio’, ‘government_id’] | TRUE | FALSE | New York, NY, United States | Midtown East | Murray Hill | Manhattan | New York | NY | 10016 | New York | New York, NY | US | United States | 40.74767 | -73.97500 | FALSE | Apartment | Entire home/apt | 2 | 1 | 1 | 1 | Real Bed | {TV,“Cable TV”,Internet,Wifi,Kitchen,“Buzzer/wireless intercom”,Heating,“Smoke detector”,“Carbon monoxide detector”,“Fire extinguisher”,Essentials,Shampoo,Hangers,“Hair dryer”,Iron,“Laptop friendly workspace”,“translation missing: en.hosting_amenity_49”,“translation missing: en.hosting_amenity_50”,“Hot water”,“Bed linens”,“Extra pillows and blankets”,“Host greets you”} | NA | $200.00 | NA | NA | $300.00 | $125.00 | 2 | $100.00 | 3 | 21 | 3 | 3 | 21 | 21 | 3 | 21 | 3 weeks ago | TRUE | 23 | 48 | 48 | 129 | 2019-07-08 | 74 | 9 | 2009-04-20 | 2019-06-22 | 89 | 10 | 9 | 10 | 10 | 9 | 9 | FALSE | NA | NA | FALSE | FALSE | strict_14_with_grace_period | TRUE | TRUE | 1 | 1 | 0 | 0 | 0.59 |
In order to simplify the analysis, we only select 15 relevant variables and filter 2-bedroom properties in NYC to create a subset for the following section.
1. Needless character
‘$’ Value prefix of every price row prevents numeric manipulation. It is thus removed from three columns: Price, Weekly Price & Monthly Price.
2. Missing values
rev_missing_val <- sapply(airbnb, function(x) sum(length(which(is.na(x)))))
kable(as.data.frame(rev_missing_val)) %>%
kableExtra::kable_styling(bootstrap_options = "striped") %>%
kableExtra::scroll_box(width = "100%", height = "250px")
| rev_missing_val | |
|---|---|
| id | 0 |
| listing_url | 0 |
| scrape_id | 0 |
| last_scraped | 0 |
| name | 16 |
| summary | 2040 |
| space | 14026 |
| description | 780 |
| experiences_offered | 0 |
| neighborhood_overview | 17504 |
| notes | 28701 |
| transit | 17078 |
| access | 21698 |
| interaction | 20062 |
| house_rules | 18876 |
| thumbnail_url | 48895 |
| medium_url | 48895 |
| picture_url | 0 |
| xl_picture_url | 48895 |
| host_id | 0 |
| host_url | 0 |
| host_name | 21 |
| host_since | 21 |
| host_location | 164 |
| host_about | 18714 |
| host_response_time | 21 |
| host_response_rate | 21 |
| host_acceptance_rate | 21 |
| host_is_superhost | 21 |
| host_thumbnail_url | 21 |
| host_picture_url | 21 |
| host_neighbourhood | 6529 |
| host_listings_count | 21 |
| host_total_listings_count | 21 |
| host_verifications | 0 |
| host_has_profile_pic | 21 |
| host_identity_verified | 21 |
| street | 0 |
| neighbourhood | 12 |
| neighbourhood_cleansed | 0 |
| neighbourhood_group_cleansed | 0 |
| city | 62 |
| state | 6 |
| zipcode | 523 |
| market | 109 |
| smart_location | 0 |
| country_code | 0 |
| country | 0 |
| latitude | 0 |
| longitude | 0 |
| is_location_exact | 0 |
| property_type | 0 |
| room_type | 0 |
| accommodates | 0 |
| bathrooms | 56 |
| bedrooms | 22 |
| beds | 40 |
| bed_type | 0 |
| amenities | 0 |
| square_feet | 48487 |
| price | 0 |
| weekly_price | 42891 |
| monthly_price | 43647 |
| security_deposit | 17317 |
| cleaning_fee | 10645 |
| guests_included | 0 |
| extra_people | 0 |
| minimum_nights | 0 |
| maximum_nights | 0 |
| minimum_minimum_nights | 0 |
| maximum_minimum_nights | 0 |
| minimum_maximum_nights | 0 |
| maximum_maximum_nights | 0 |
| minimum_nights_avg_ntm | 0 |
| maximum_nights_avg_ntm | 0 |
| calendar_updated | 0 |
| has_availability | 0 |
| availability_30 | 0 |
| availability_60 | 0 |
| availability_90 | 0 |
| availability_365 | 0 |
| calendar_last_scraped | 0 |
| number_of_reviews | 0 |
| number_of_reviews_ltm | 0 |
| first_review | 10052 |
| last_review | 10052 |
| review_scores_rating | 11022 |
| review_scores_accuracy | 11060 |
| review_scores_cleanliness | 11043 |
| review_scores_checkin | 11078 |
| review_scores_communication | 11055 |
| review_scores_location | 11082 |
| review_scores_value | 11080 |
| requires_license | 0 |
| license | 48895 |
| jurisdiction_names | 48895 |
| instant_bookable | 0 |
| is_business_travel_ready | 0 |
| cancellation_policy | 0 |
| require_guest_profile_picture | 0 |
| require_guest_phone_verification | 0 |
| calculated_host_listings_count | 0 |
| calculated_host_listings_count_entire_homes | 0 |
| calculated_host_listings_count_private_rooms | 0 |
| calculated_host_listings_count_shared_rooms | 0 |
| reviews_per_month | 10052 |
3. Extreme values
There are a few of extreme values in price such as $9,000 or $0 per night. The reason for this issue could be a data input error of the hosts. We can simply remove these rows.
4. Ambiguous match between zipcode and neighborhood
Some zipcodes belong to more than one neighborhood in the dataset. To solve this problem, we keep the neighborhood where the zipcode normally belongs as the only match.
#Quality Check
#1. missing value of zip code of NYC in zillow
#solution: new data source can be connected in the future
#2. Remove needless character and convert character to numeric
chr_to_num <- function(col_name) {
col_name <- str_sub(col_name, 2, -4)
col_name <- str_replace_all(col_name, ",", "")
col_name <- as.numeric(col_name)
}
airbnb_subset$price <- chr_to_num(airbnb_subset$price)
airbnb_subset$weekly_price <- chr_to_num(airbnb_subset$weekly_price)
airbnb_subset$monthly_price <- chr_to_num(airbnb_subset$monthly_price)
#3. NAs in weekly price and monthly price
#solution: fill in the NAs with 7 and 31 times of daily price
missing_weekly <- which(is.na(airbnb_subset["weekly_price"]))
airbnb_subset$weekly_price[missing_weekly] <- airbnb_subset$price[missing_weekly]*7
missing_monthly <- which(is.na(airbnb_subset["monthly_price"]))
airbnb_subset$monthly_price[missing_monthly] <- airbnb_subset$price[missing_monthly]*30
#4. extreme values in price
#It can't be $9,999 or $0 a night.
#The reason for this could be a data input error of the host
airbnb_subset <- subset(airbnb_subset, price < 9000 & price != 0)
#5. zipcode: convert numeric to character
airbnb_subset$zipcode <- as.character(airbnb_subset$zipcode)
The two datasets have different units of time. In order to complete the analysis, we determine a common unit of time, which is month.
The revenue data is scraped in 2019-07, but the cost data only have historical house The revenue data is scraped in 2019-07, but the cost data only have historical house value from 1996-04 to 2017-06. Due to shortage of time, we assume that there is seasonality in the price and that values depend not only on previous values (Auto Regressive AR) but also on differences between previous values (Moving Average MA). So, we apply ARIMA model to predict the cost of the properties in zipcodes from 2017-07 to 2019-07. We then attach the price of property in 2019-07 calculated at zipcode level with each zipcode as a new column to cost data.
#determine a common unit of time - monthly
airbnb_subset$last_scraped <- format(airbnb_subset$last_scraped,format="%Y-%m")
#trend of house price
design.mat <- cbind(1,1:121)
response.mat <- t(zillow_subset[,3:257])
response.mat <- response.mat[complete.cases(response.mat),]
reg <- lm.fit(design.mat, response.mat)$coefficients
zillow_subset <- cbind(zillow_subset, t(reg))
names(zillow_subset)[names(zillow_subset)=="x1"]="intercept"
names(zillow_subset)[names(zillow_subset)=="x2"]="slope"
#forecast
scraped_date <- unique(airbnb_subset$last_scraped)
zillow_subset$scraped_date <- NULL
for(i in 1:nrow(zillow_subset)){
# Convert the monthly cost data into time series data
time_series = ts(as.vector(t(zillow_subset[,c(3:257)])[,i]),
start = c(1996,4),frequency = 12)
# Define ARIMA model to be used for prediction
ARIMAfit = arima(time_series, order=c(1,1,1), seasonal=list(order=c(1,0,1),
period=NA), method="ML")
# use the ARIMA model to predict the price from 2017-07 to 2019-07
pred = predict(ARIMAfit, n.ahead = 25)
# Store the predicted values in a variable
pred_val <- pred$pred
# set the value of current price for the specific zipcode as price in 2019-07
zillow_subset$scraped_date[i] <- pred_val[length(pred_val)]
}
In order to make sure this product is always applicable whenever new data is available or whenever we are ready to approach a new market, a function is built to link the data together in a scalable way by matching the scraped date and zipcode of two datasets.
After inner join, we recognize that there are 1508 properties of 24 zipcodes in 4 distinct neighborhoods - Manhattan, Queens, Brooklyn and Staten Island – matching together.
#link the data together in a scalable way
names(zillow_subset)[names(zillow_subset)=="RegionName"]="zipcode"
dt_full <- merge(airbnb_subset, zillow_subset, by = "zipcode")
dt <- select(dt_full, zipcode:RegionID, scraped_date, growth_rate = "slope")
Property type issue
If the property type == Private Room, the price of the property should be corrected by price*bedrooms
Average daily price
In order to calculate the average daily price, we need to consider both tenant booking habit and host requirement for minimum and maximum nights.
Based on the common sense of general booking habit, we assume that in a month 60% of bookings are single day bookings, 30% are weekly bookings and 10% of bookings are made for a month.
As for the minimum/maximum amount of nights the host is willing to rent out the property, we assume that the minimum/maximum nights determine whether a property can be directly booked for a weekly and for a month.
Considering these two factors, we can calculate the average daily price by a conditional metric of minimum/maximum nights and booking habit weights.
#Price corrected
#If the property type == Private Room, total price should be corrected by price*bedrooms
dt <- dt %>%
mutate(price = if_else(room_type == "Private room", price * 2, price))
#assumption: 0.6 daily price, 0.3 weekly price, 0.1 monthly price.
#This coefficient can change
daily_prob <- 0.6
weekly_prob <- 0.3
monthly_prob <- 0.1
dt$avg_daily_price <- 0
for(i in 1:nrow(dt)){
if(dt$minimum_nights[i]<7 & dt$maximum_nights[i]<7){
dt$avg_daily_price[i] = dt$price[i]
} else if (dt$minimum_nights[i]<7 & dt$maximum_nights[i]>=7 & dt$maximum_nights[i]<30){
dt$avg_daily_price[i] = daily_prob*dt$price[i]
+(weekly_prob+monthly_prob)*dt$weekly_price[i]/7
} else if (dt$minimum_nights[i]<7 & dt$maximum_nights[i]>=30){
dt$avg_daily_price[i] = daily_prob*dt$price[i]+weekly_prob*dt$weekly_price[i]/7+
monthly_prob*dt$monthly_price[i]/30
} else if (dt$minimum_nights[i]>=7 & dt$minimum_nights[i]<30
& dt$maximum_nights[i]>=7 & dt$maximum_nights[i]<30){
dt$avg_daily_price[i] = dt$weekly_price[i]/7
} else if (dt$minimum_nights[i]>=7 & dt$minimum_nights[i]<30
& dt$maximum_nights[i]>=30){
dt$avg_daily_price[i] = (daily_prob+weekly_prob)*dt$weekly_price[i]/7
+monthly_prob*dt$monthly_price[i]/30
} else if (dt$minimum_nights[i]>=30 & dt$maximum_nights[i]>=30){
dt$avg_daily_price[i] = dt$monthly_price[i]/30
}
}
occupancy_rate <- 0.75
dt$monthly_rev <- dt$avg_daily_price*30*occupancy_rate
dt$annual_rev <- dt$monthly_rev*12
#summarize by zipcode
zip_summary <- dt %>%
group_by(zipcode) %>%
summarize(count = n(),
avg_monthly_rev = round(mean(monthly_rev),2),
avg_annual_rev = round(mean(annual_rev),2),
avg_cost = round(mean(scraped_date),2),
payback_year = round(mean(avg_cost/avg_annual_rev),2),
rev_cost_ratio = round(avg_annual_rev/avg_cost,2),
trend = round(mean(growth_rate),2),
neighbourhood_group_cleansed =
paste(unique(neighbourhood_group_cleansed), collapse = ","))
#some zipcodes belong to more than one neighborhood
zip_summary <- zip_summary %>%
separate(neighbourhood_group_cleansed, c("neighbourhood", NA), sep = ",")
In order to smoothen the process of choosing the zipcodes that are the best to invest in, the analysis going forward considers several key factors, which will finally be integrated to choose the top zipcodes.
Zipcodes are evaluated on the following conditions:
Number of properties
Cost of properties
Average daily price
Annual revenue
Payback year
Revenue cost ratio
Growth rate of price
Higher the number of properties, more choices our client can invest, and more rental activities can possibly have in the area.
At the neighborhood level, Manhattan and Brooklyn host highest number of properties. Queens and Staten Island have much fewer number of properties.
Zipcodes 11215, 10036, 10003, 11217, 10025, 10013 have more than 100 properties in each area, which makes them the top 10 based on volume of properties.
#number of properties
num_house_plot <-
zip_summary %>%
arrange(desc(count)) %>%
top_n(10, count) %>%
ggplot(aes(x = reorder(zipcode, count), y = count, fill = neighbourhood))+
geom_col()+
coord_flip()+
labs(x = "Zipcode", y = "Number of Properties")+
theme_bw() +
theme(plot.background = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),panel.border = element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1))+
guides(fill = guide_legend(title = "Neighbourhood"))
ggplotly(num_house_plot)
One of the primary constraints in decision of investment is the cost. Lower the cost, company can save a lot more and reach ROI faster.
Zipcodes in Manhattan walk away with highest property cost, with an average of about $2M. Brooklyn comes second, with about $1M. Zipcodes in Staten Island and Queens have much lower prices, with lower than $0.5M.
cost_plot <-
zip_summary %>%
arrange(desc(avg_cost)) %>%
top_n(24, avg_cost) %>%
ggplot(aes(x = reorder(zipcode, avg_cost), y = avg_cost, fill = neighbourhood))+
geom_col()+
coord_flip()+
labs(x = "Zipcode", y = "Cost of Properties")+
theme_bw() +
theme(plot.background = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),panel.border = element_blank(),
axis.text.x = element_text(hjust = 1))+
scale_y_continuous(breaks=c(0, 1000000, 2000000), labels=c("0","1M","2M"))+
guides(fill = guide_legend(title = "Neighbourhood"))
ggplotly(cost_plot)
Higher the average daily rental price, faster is the cash flow. Therefore, company would look forward to making quick bucks with high daily price.
Manhattan has a wider spread with higher price ranging to $50-$1000 per night.
Brooklyn comes second also with wide spread but lower price comparing to Manhattan.
Staten Island and Queens have lower price and narrower distribution, and this is probably because of limited sample size.
price_plot <- ggplot(dt,aes(x = reorder(zipcode, avg_daily_price),
y = avg_daily_price,
fill = neighbourhood_group_cleansed))+
geom_boxplot()+
coord_flip()+
labs(x = "Zipcode", y = "Annual Daily Price of Properties")+
theme_bw() +
theme(plot.background = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),panel.border = element_blank(),
axis.text.x = element_text(hjust = 1))+
guides(fill = guide_legend(title = "Neighbourhood")) +
scale_y_continuous(limits = quantile(dt$avg_daily_price, c(0, 0.99)))
ggplotly(price_plot)
Different from the daily price, the annual revenue also taken into account the effects of weekly/monthly discount prices, minimum/maximum days and occupancy rate. Therefore, the annual revenue will provide a more comprehensive information to help us make better investment choices.
As usual, Manhattan has the highest revenue, and Brooklyn comes second.
rev_plot <-
zip_summary %>%
arrange(desc(avg_annual_rev)) %>%
top_n(10, avg_annual_rev) %>%
ggplot(aes(x = reorder(zipcode, avg_annual_rev),
y = avg_annual_rev, fill = neighbourhood))+
geom_col()+
coord_flip()+
labs(x = "Zipcode", y = "Annual Revenue of Properties")+
theme_bw() +
theme(plot.background = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),panel.border = element_blank(),
axis.text.x = element_text(hjust = 1))+
scale_y_continuous(breaks=c(0, 100000, 200000,300000),
labels=c("0","1M","2M","3M"))+
guides(fill = guide_legend(title = "Neighbourhood"))
ggplotly(rev_plot)
Companies also value payback year when making investment decisions. The shorter the payback time, the faster the company can recoup its initial capital.
The zipcode 10306 in Staten Island has the shortest payback year (12 years), while nearly all the zipcodes in Manhattan and Brooklyn have more than 20 years of payback time.
#payback year
payback_plot <-
zip_summary %>%
arrange(desc(payback_year)) %>%
top_n(10, payback_year) %>%
ggplot(aes(x = reorder(zipcode, payback_year),y = payback_year, fill = neighbourhood))+
geom_col()+
coord_flip()+
labs(x = "Zipcode", y = "Payback Year of Properties")+
theme_bw() +
theme(plot.background = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),panel.border = element_blank(),
axis.text.x = element_text(hjust = 1))+
guides(fill = guide_legend(title = "Neighbourhood"))
ggplotly(payback_plot)
Company should invest in those zipcodes which provide high return, which also means high revenue to cost ratio.
We notice that zipcode 11434, 10308 and 10306 in Queens and Staten Island provide the highest return for each dollar spent in buying the property, which is about 8%. Then comes to 11234 in Brooklyn. Most of Manhattan’s zipcodes have low return rates.
#profitability
profit_plot <-
zip_summary %>%
arrange(desc(rev_cost_ratio)) %>%
top_n(10, rev_cost_ratio) %>%
ggplot(aes(x = reorder(zipcode,rev_cost_ratio),
y = rev_cost_ratio, fill = neighbourhood))+
geom_col()+
coord_flip()+
labs(x = "Zipcode", y = "Profitability of Properties")+
theme_bw() +
theme(plot.background = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),panel.border = element_blank(),
axis.text.x = element_text(hjust = 1))+
guides(fill = guide_legend(title = "Neighbourhood"))
ggplotly(profit_plot)
When making investment decisions, companies should not only consider the current market situation, but should also have a developing vision. Price trend in the real estate industry is also important factors to consider.
From the time series plot, we can see that after 2010, prices in four neighborhoods are all increasing. Staten Island and Queens have a slightly higher increasing rate.
#growth rate
#summarize by neighborhood
neighbor_summary <- dt_full %>%
group_by(neighbourhood_group_cleansed) %>%
summarize_each(funs(mean)) %>%
select(neighbourhood_group_cleansed, contains("-"))
t_neighbor_summary <- data.frame(t(neighbor_summary))
t_neighbor_summary <-t_neighbor_summary[-1,]
names(t_neighbor_summary)[1:4]=c("Brooklyn", "Manhattan", "Queens", "Staten Island")
t_neighbor_summary$date_ts <- seq(as.Date("1996-04-01"), as.Date("2017-06-01"),by = "month")
t_neighbor_summary$Brooklyn <- as.numeric(paste(t_neighbor_summary$Brooklyn))
t_neighbor_summary$Manhattan <- as.numeric(paste(t_neighbor_summary$Manhattan))
t_neighbor_summary$Queens <- as.numeric(paste(t_neighbor_summary$Queens))
t_neighbor_summary$`Staten Island` <- as.numeric(paste(t_neighbor_summary$`Staten Island`))
trend_plot <- plot_ly(x = t_neighbor_summary$date_ts, y = t_neighbor_summary$Brooklyn,
type="scatter", mode="lines", name = 'Brooklyn', fill = "Paired") %>%
add_trace(p, x = t_neighbor_summary$date_ts, y = t_neighbor_summary$Manhattan,
type="scatter", mode="lines", name = 'Manhattan', fill = "Paired") %>%
add_trace(p, x = t_neighbor_summary$date_ts, y = t_neighbor_summary$Queens,
type="scatter", mode="lines", name = 'Queens', fill = "Paired") %>%
add_trace(p, x = t_neighbor_summary$date_ts, y = t_neighbor_summary$`Staten Island`,
type="scatter", mode="lines", name = 'Staten Island', fill = "Paired")
ggplotly(trend_plot)
Since there are multiple factors mentioned above influencing investment decisions, we can construct a scoring metrics to comprehensively evaluate these factors and score the different factors according to the set weights, and finally get a normalized score to select the best zipcodes.
When the weight of count, cost, revenue, profit, payback year and trend are initially set as 0.2, 0.2, 0.1, 0.3, 0.1, 0.1, the top 10 zipcodes with the highest score are: 11434, 10306, 11215, 10036, 10303, 10304, 10308, 10305, 10314, 11234. Six of them are located in Staten Island, two are in Brooklyn, one in Manhattan and one in Queens.
#scoring matric
weight_count <- 0.2
weight_cost <- 0.2
weight_rev <- 0.1
weight_payback <- 0.1
weight_profit <- 0.3
weight_trend <- 1-weight_count-weight_cost-weight_rev-weight_payback-weight_profit
zip_summary$score <- round(weight_count*(zip_summary$count/mean(zip_summary$count))+
weight_cost*(1/(zip_summary$avg_cost/mean(zip_summary$avg_cost)))+ weight_rev*(zip_summary$avg_annual_rev/mean(zip_summary$avg_annual_rev))+
weight_payback*(1/(zip_summary$payback_year/mean(zip_summary$payback_year)))+ weight_profit*(zip_summary$rev_cost_ratio/mean(zip_summary$rev_cost_ratio))+
weight_trend*(zip_summary$trend/mean(zip_summary$trend)),2)
score_plot <-
zip_summary %>%
arrange(desc(score)) %>%
top_n(10, score) %>%
ggplot(aes(x = reorder(zipcode, score), y = score, fill = neighbourhood))+
geom_col()+
coord_flip()+
labs(x = "Zipcode", y = "Score of Properties")+
theme_bw() +
theme(plot.background = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),panel.border = element_blank(),
axis.text.x = element_text(hjust = 1))+
guides(fill = guide_legend(title = "Neighbourhood"))
ggplotly(score_plot)
From the invest map, we can intuitively see which zipcodes areas are most worthy of investment. Properties in different areas are shown in different colored dots. The size of the points represents the estimated annual revenue for the next year.
You can see that these zipcodes are scattered across four different areas of New York. Staten island has the most zipcodes, which are located on the east and west sides. The Carroll Gardens and Marine Park areas in Northwest Brooklyn as well as Jamaica area in Queens are also good choices, while In Manhattan, properties near the Times square in midtown are the best investments.
weight_count <- 0.2
weight_cost <- 0.2
weight_rev <- 0.1
weight_payback <- 0.1
weight_profit <- 0.3
weight_trend <- 1-weight_count-weight_cost-weight_rev-weight_payback-weight_profit
zip_summary$score <- round(weight_count*(zip_summary$count/mean(zip_summary$count))+
weight_cost*(1/(zip_summary$avg_cost/mean(zip_summary$avg_cost)))+
weight_rev*(zip_summary$avg_annual_rev/mean(zip_summary$avg_annual_rev))+
weight_payback*(1/(zip_summary$payback_year/mean(zip_summary$payback_year)))+
weight_profit*(zip_summary$rev_cost_ratio/mean(zip_summary$rev_cost_ratio))+
weight_trend*(zip_summary$trend/mean(zip_summary$trend)),2)
top_score <- head(arrange(zip_summary,desc(score)),n=10)
top_score <- top_score$zipcode
dt_top <- dt %>% filter(zipcode %in% top_score)
map_text <- paste("zipcode: ",dt_top$zipcode,"<br/>",
"longitude: ",dt_top$longitude,"<br/>",
"latitude: ", dt_top$latitude,"<br/>",
"daily price: ",dt_top$avg_daily_price, sep="") %>%
lapply(htmltools::HTML)
pal <- colorFactor(palette = "plasma", domain = dt$zipcode)
leaflet(dt_top) %>%
addTiles() %>%
setView( lng = -73.97, lat = 40.7, zoom = 10 ) %>%
addProviderTiles("OpenStreetMap.Mapnik") %>%
addCircleMarkers(~longitude, ~latitude, radius = 5,
color = ~pal(zipcode),
stroke = FALSE, fillOpacity = 0.6,
label = map_text,
labelOptions =
labelOptions(style = list("font-weight" = "normal",
padding = "3px 8px"), textsize = "13px", direction = "auto")
) %>%
addLegend( pal= pal , values=~zipcode, opacity=0.9, title = "Zipcode",
position = "bottomright")
Number of properties & Cost of properties
Zipcodes 11215, 10303, 11434, 10304 and 10306 have substantial number of properties at low cost, which are located in Brooklyn, Staten Island and Queens. So, if the company has budget constraints then they should invest in buying properties in these zipcodes.
Annual revenue & Revenue cost ratio
If the company is willing to buy properties having high cost, then they should invest in Zipcodes 10038 in Staten Island, 11215 in Brooklyn, 10036, 10003 and 10025 in Manhattan because these zipcodes not only have high number of costly properties but also provide very high return as the revenue is high.
Payback year
If the company is more focused on getting its money back quickly with a limited initial investment, zipcodes 10038, 10314, 11434, 10303 and 10306 fit the bill, which are all located in Staten Island and Queens.
Price trend
If the company attaches importance on long-term growth and future earnings and wants to invest in properties where rents are rising faster, these are the zipcodes to go: 10013, 10014, 10011 in Manhattan, 10308 in Staten Island, and 11201 in Brooklyn.
Our basic recommendation for the company is to diversify and buy properties in top performing zips of different neighborhoods with prime focus on Staten Island.
Through our products, the company can adjust the weight of each factor according to its actual needs to obtain the most suitable investment portfolio plan for its own investment strategy.
If the company is a risk taker, it can choose the property with high investment and high yield. If the company is risk averse, it can choose the property with low initial cost and fast return.
Company should diversify its investment portfolios and locations to not only minimize investment risk, but also lay the foundation for future expansion in multiple markets.
New York hosts 176 zipcodes, data can be further enriched to account for rest of the zipcodes. This would give the company more opportunities to diverse the investment portfolio.
Future revenue prediction: In the present analysis, we used average daily price * 360 * occupancy rate to calculate the annual revenue. It is true based on the assumption that the rental revenue will not dramatically change within a year. However, since the house price is continuingly increasing monthly, the rental price may also increase by time. Future analysis can consider the price trend of rental price in order to calculate income more accurately and realistically.
Occupancy rate: In the present analysis, we simply assume the occupancy rate is 0.75 for every property in every zipcode. However, in reality, there may be many factors which can impact occupancy rate, such as review score, location and room quality. In the future, we could customize occupancy rate of each property by building a regression model.
Interest rate: In this case, we have taken 0% discount rate as our assumption, but this assumption is not realistic. Some reasonable percentage rate can be taken to calculate NPV value and make a more accurate prediction.
Text analytics on ignored description columns from revenue data: This would open insights about other metrics that drive customer to book an AirBnb property for rental such as access to public transport, parking space, etc.